智能论文笔记

Computer vision based vehicle tracking as a complementary and scalable approach to RFID tagging

Pranav Kant Gaur , Abhilash Bhardwaj , Pritam Shete , Mohini Laghate , Dinesh M Sarode

分类：计算机视觉

2022-09-13

传入/传出车辆的记录是根本原因分析的关键信息，以打击各种敏感组织中的安全违规事件。 RFID标记会阻碍物流和技术方面的车辆跟踪解决方案的可扩展性。例如，要求标记为RFID的每个传入车辆（部门或私人）是严重的限制，并且与RFID一起检测异常车辆运动的视频分析是不平凡的。我们利用公开可用的计算机视觉算法实现，使用有限状态机形式主义开发可解释的车辆跟踪算法。国家机器将用于状态转换的级联对象检测和光学特征识别（OCR）模型中的输入。我们从系统部署站点中评估了75个285辆车的视频片段中提出的方法。我们观察到检测率受速度和车辆类型的影响最大。当车辆运动仅限于在检查点类似于RFID标记的检查点时，将达到最高的检测率。我们进一步分析了700个对Live DATA的车辆跟踪预测，并确定大多数车辆数量预测误差是由于无法辨认的文本，图像布鲁尔，文本遮挡，文本遮挡和vecab外字母引起的。为了进行系统部署和性能增强，我们希望我们正在进行的系统监控能够提供证据，以在安全检查点上建立更高的车辆通知SOP，并将已部署的计算机视觉模型和状态模型的微调驱动为建立拟议的方法作为RFID标记的有希望的替代方法。

translated by 谷歌翻译

Exploration of an End-to-End Automatic Number-plate Recognition neural network for Indian datasets

Sai Sirisha Nadiminti , Pranav Kant Gaur , Abhilash Bhardwaj

分类：计算机视觉

2022-07-14

印度车辆板在尺寸，字体，脚本和形状方面的种类繁多。因此，自动数板识别（ANPR）解决方案的开发是具有挑战性的，因此需要一个多样化的数据集作为示例集合。但是，缺少印度情景的全面数据集，从而阻碍了在公开可用和可重现的ANPR解决方案方面的进展。许多国家已经投入了努力，为中国和面向应用程序的车牌（AOLP）数据集开发诸如中国城市停车数据集（CCPD）等全面的ANPR数据集为我们提供了努力。在这项工作中，我们发布了一个扩展的数据集，该数据集目前由1.5K图像组成，以及可扩展且可重复的程序，以增强该数据集以开发印度条件的ANPR解决方案。我们利用此数据集探索了印度场景的端到端（E2E）ANPR体系结构，该架构最初是根据CCPD数据集为中国车辆号码板识别的。当我们为数据集定制体系结构时，我们遇到了见解，我们在本文中讨论了这一点。我们报告了CCPD作者提供的模型直接可重复使用性的障碍，因为印度数字板的极端多样性以及相对于CCPD数据集的分布差异。在将印度数据集的特性与中国数据集对齐后，在LP检测中观察到了42.86％的改善。在这项工作中，我们还将E2E数板检测模型的性能与Yolov5模型进行了比较，并在可可数据集上进行了预训练，并在印度车辆图像上进行了微调。鉴于用于微调检测模块和Yolov5的数量印度车辆图像是相同的，我们得出的结论是，基于COCO数据集而不是CCPD数据集开发针对印度条件的ANPR解决方案更有效。

translated by 谷歌翻译

Node-Element Hypergraph Message Passing for Fluid Dynamics Simulations

Rui Gao , Indu Kant Deo , Rajeev K. Jaiman

分类：机器学习

2022-12-30

A recent trend in deep learning research features the application of graph neural networks for mesh-based continuum mechanics simulations. Most of these frameworks operate on graphs in which each edge connects two nodes. Inspired by the data connectivity in the finite element method, we connect the nodes by elements rather than edges, effectively forming a hypergraph. We implement a message-passing network on such a node-element hypergraph and explore the capability of the network for the modeling of fluid flow. The network is tested on two common benchmark problems, namely the fluid flow around a circular cylinder and airfoil configurations. The results show that such a message-passing network defined on the node-element hypergraph is able to generate more stable and accurate temporal roll-out predictions compared to the baseline generalized message-passing network defined on a normal graph. Along with adjustments in activation function and training loss, we expect this work to set a new strong baseline for future explorations of mesh-based fluid simulations with graph neural networks.

translated by 谷歌翻译

From Competition to Collaboration: Making Toy Datasets on Kaggle Clinically Useful for Chest X-Ray Diagnosis Using Federated Learning

Pranav Kulkarni , Adway Kanhere , Paul H. Yi , Vishwa S. Parekh

分类：计算机视觉 | 机器学习

2022-11-11

Chest X-ray (CXR) datasets hosted on Kaggle, though useful from a data science competition standpoint, have limited utility in clinical use because of their narrow focus on diagnosing one specific disease. In real-world clinical use, multiple diseases need to be considered since they can co-exist in the same patient. In this work, we demonstrate how federated learning (FL) can be used to make these toy CXR datasets from Kaggle clinically useful. Specifically, we train a single FL classification model (`global`) using two separate CXR datasets -- one annotated for presence of pneumonia and the other for presence of pneumothorax (two common and life-threatening conditions) -- capable of diagnosing both. We compare the performance of the global FL model with models trained separately on both datasets (`baseline`) for two different model architectures. On a standard, naive 3-layer CNN architecture, the global FL model achieved AUROC of 0.84 and 0.81 for pneumonia and pneumothorax, respectively, compared to 0.85 and 0.82, respectively, for both baseline models (p>0.05). Similarly, on a pretrained DenseNet121 architecture, the global FL model achieved AUROC of 0.88 and 0.91 for pneumonia and pneumothorax, respectively, compared to 0.89 and 0.91, respectively, for both baseline models (p>0.05). Our results suggest that FL can be used to create global `meta` models to make toy datasets from Kaggle clinically useful, a step forward towards bridging the gap from bench to bedside.

translated by 谷歌翻译

Federated Learning Using Three-Operator ADMM

Shashi Kant , José Mairton B. da Silva Jr. , Gabor Fodor , Bo Göransson , Mats Bengtsson , Carlo Fischione

分类：机器学习

2022-11-08

Federated learning (FL) has emerged as an instance of distributed machine learning paradigm that avoids the transmission of data generated on the users' side. Although data are not transmitted, edge devices have to deal with limited communication bandwidths, data heterogeneity, and straggler effects due to the limited computational resources of users' devices. A prominent approach to overcome such difficulties is FedADMM, which is based on the classical two-operator consensus alternating direction method of multipliers (ADMM). The common assumption of FL algorithms, including FedADMM, is that they learn a global model using data only on the users' side and not on the edge server. However, in edge learning, the server is expected to be near the base station and have direct access to rich datasets. In this paper, we argue that leveraging the rich data on the edge server is much more beneficial than utilizing only user datasets. Specifically, we show that the mere application of FL with an additional virtual user node representing the data on the edge server is inefficient. We propose FedTOP-ADMM, which generalizes FedADMM and is based on a three-operator ADMM-type technique that exploits a smooth cost function on the edge server to learn a global model parallel to the edge devices. Our numerical experiments indicate that FedTOP-ADMM has substantial gain up to 33\% in communication efficiency to reach a desired test accuracy with respect to FedADMM, including a virtual user on the edge server.

translated by 谷歌翻译

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Yashesh Gaur , Nick Kibre , Jian Xue , Kangyuan Shu , Yuhui Wang , Issac Alphanso , Jinyu Li , Yifan Gong

分类：自然语言处理 | 人工智能

2022-11-07

Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted Finite State Transducers (WFST) have been employed to do ITN. WFSTs are nicely suited to this task but their size and run-time costs can make deployment on embedded applications challenging. In this paper, we describe the development of an on-device ITN system that is streaming, lightweight & accurate. At the core of our system is a streaming transformer tagger, that tags lexical tokens from ASR. The tag informs which ITN category might be applied, if at all. Following that, we apply an ITN-category-specific WFST, only on the tagged text, to reliably perform the ITN conversion. We show that the proposed ITN solution performs equivalent to strong baselines, while being significantly smaller in size and retaining customization capabilities.

translated by 谷歌翻译

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Peidong Wang , Eric Sun , Jian Xue , Yu Wu , Long Zhou , Yashesh Gaur , Shujie Liu , Jinyu Li

分类：自然语言处理

2022-11-05

End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it easy to use a single model for both multilingual ASR and many-to-many ST. In this paper, we propose streaming language-agnostic multilingual speech recognition and translation using neural transducers (LAMASSU). To enable multilingual text generation in LAMASSU, we conduct a systematic comparison between specified and unified prediction and joint networks. We leverage a language-agnostic multilingual encoder that substantially outperforms shared encoders. To enhance LAMASSU, we propose to feed target LID to encoders. We also apply connectionist temporal classification regularization to transducer training. Experimental results show that LAMASSU not only drastically reduces the model size but also outperforms monolingual ASR and bilingual ST models.

translated by 谷歌翻译

A general-purpose material property data extraction pipeline from large polymer corpora using Natural Language Processing

Pranav Shetty , Arunkumar Chitteth Rajan , Christopher Kuenneth , Sonkakshi Gupta , Lakshmi Prerana Panchumarti , Lauren Holm , Chao Zhang , Rampi Ramprasad

分类：自然语言处理

2022-09-27

不断增加的材料科学文章使得很难从已发表的文献中推断化学结构 - 培训关系。我们使用自然语言处理（NLP）方法从聚合物文献的摘要中自动提取材料属性数据。作为我们管道的组成部分，我们使用240万材料科学摘要培训了一种语言模型的材料，该材料模型在用作文本编码器时，在五分之三命名实体识别数据集中的其他基线模型都优于其他基线模型。使用此管道，我们在60小时内从约130,000个摘要中获得了约300,000个物质记录。分析了提取的数据，分析了各种应用，例如燃料电池，超级电容器和聚合物太阳能电池，以恢复非平凡的见解。通过我们的管道提取的数据可通过https://polymerscholar.org的Web平台提供，该数据可方便地定位摘要中记录的材料属性数据。这项工作证明了自动管道的可行性，该管道从已发布的文献开始，并以一组完整的提取物质属性信息结束。

translated by 谷歌翻译

Just-In-Time Learning for Operational Risk Assessment in Power Grids

Oliver Stover , Pranav Karve , Sankaran Mahadevan , Wenbo Chen , Haoruo Zhao , Mathieu Tanneau , Pascal Van Hentenryck

分类：机器学习

2022-09-26

在具有可再生生成的大量份额的网格中，由于负载和发电的波动性增加，运营商将需要其他工具来评估运营风险。正向不确定性传播问题的计算要求必须解决众多安全受限的经济调度（SCED）优化，是这种实时风险评估的主要障碍。本文提出了一个即时风险评估学习框架（Jitralf）作为替代方案。 Jitralf训练风险代理，每天每小时一个，使用机器学习（ML）来预测估计风险所需的数量，而无需明确解决SCED问题。这大大减轻了正向不确定性传播的计算负担，并允许快速，实时的风险估计。本文还提出了一种新颖的，不对称的损失函数，并表明使用不对称损失训练的模型的性能优于使用对称损耗函数的模型。在法国传输系统上评估了Jitralf，以评估运营储量不足的风险，减轻负载的风险和预期的运营成本。

translated by 谷歌翻译

EGFR Mutation Prediction of Lung Biopsy Images using Deep Learning

Ravi Kant Gupta , Shivani Nandgaonkar , Nikhil Cherian Kurian , Swapnil Rane , Amit Sethi

分类：计算机视觉 | 人工智能 | 机器学习

2022-08-26

肺癌治疗中有针对性疗法的标准诊断程序涉及组织学亚型和随后检测关键驱动因素突变，例如EGFR。即使分子分析可以发现驱动器突变，但该过程通常很昂贵且耗时。深度学习的图像分析为直接从整个幻灯片图像（WSIS）直接发现驱动器突变提供了一种更经济的替代方法。在这项工作中，我们使用具有弱监督的自定义深度学习管道来鉴定苏木精和曙红染色的WSI的EGFR突变的形态相关性，此外还可以检测到肿瘤和组织学亚型。我们通过对两个肺癌数据集进行严格的实验和消融研究来证明管道的有效性-TCGA和来自印度的私人数据集。通过管道，我们在肿瘤检测下达到了曲线（AUC）的平均面积（AUC），在TCGA数据集上的腺癌和鳞状细胞癌之间的组织学亚型为0.942。对于EGFR检测，我们在TCGA数据集上的平均AUC为0.864，印度数据集的平均AUC为0.783。我们的关键学习点包括以下内容。首先，如果要在目标数据集中微调特征提取器，则使用对组织学训练的特征提取器层没有特别的优势。其次，选择具有较高细胞的斑块，大概是捕获肿瘤区域，并不总是有帮助的，因为疾病类别的迹象可能存在于肿瘤 - 肿瘤的基质中。

translated by 谷歌翻译